Information presentation apparatus with meta-information management function

ABSTRACT

A high-speed information collection utilizing a meta-information management unit which manages update information pertaining to documents which are retrieved by web robots. An information presentation apparatus, an information collection apparatus including a web robot, and a client are connected via a network. The information presentation apparatus presents document information stored in a document storage unit to an information collection apparatus. The information presentation apparatus has a meta-information management unit, which generates update information pertaining to individual documents, and a meta-information table which records the update information. When the web robot makes a collection request to the information presentation apparatus, the meta-information management unit references the meta-information table and generates a list of updated documents from all of the stored documents and/or collection targets, and presents the list to the web which subsequently retrieves only the updated documents.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority under 35 USC §119 from Japanese Patent Publication No. 10-008416, the disclosure of which is incorporated herein by reference. The present application is a continuation of U.S. application Ser. No. 09/127,954, filed Aug. 3, 1998, now pending, the disclosure of which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] The present invention concerns a meta-information presentation apparatus equipped with an information management function which allows an information collection apparatus to efficiently collect information.

[0003] In recent years, a decentralized hypertext system known as the World Wide Web (“WWW”) has become popular and has proliferated rapidly. The main, public portion of the WWW is carried by the increasingly popular Internet, but smaller private subsets may be formed in LANs and are called Intranets. The growth of the public WWW has been exponential, such that an enormous amount of information is now presented on the WWW. The WWW comprises a plurality of WWW servers which provide information, and clients (termed “browsers”) which are used to access the information. A single WWW server typically manages a plurality of “web pages” joined together by links. A user accesses (or “surfs”) information on the WWW using a browser by following the links to different web pages.

[0004] A “search engine” is often used to search information in the web pages on the WWW. A search engine effects a search function by using an information collection apparatus, termed a “web robot”, which collects information provided by a WWW server and then prepares an index on the collected information.

[0005] Typically, a web robot collects web page information by accessing all of the web pages on the WWW (managed by numerous WWW servers) one page at a time by following the links in each page. Because WWW server information is updated daily, a web robot must periodically access each WWW server to gather the information required for a search. Heretofore, information has been collected by accessing all web pages regardless of whether the content of the web page has been updated. In other words, the web robot retrieves each and every page each time it is run regardless of whether it had retrieved the page before.

[0006] When a web robot sequentially accesses all the web pages that a WWW server manages, a web robot places a large burden on the WWW server by continuously connecting to and accessing WWW pages on the WWW server. At the same time, the web robot collects a great quantity of information and therefore causes increased network traffic. Additionally, when the server stores a great number of WWW pages, an enormous amount of time is required to cycle through all of the web pages, causing a delay in updating the data used by the search engine. Thus, depending on the search engine, it is impossible to search the most recent information.

SUMMARY OF THE INVENTION

[0007] It is an object of the present invention to alleviate the burden on information presentation apparatus and associated networks while affording information collection at a high speed by furnishing a meta-information management unit which manages update information on documents in an information presentation apparatus.

[0008] Additional objects and advantages of the invention will be set forth in part in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] These and other objects and advantages of the invention will become apparent and more readily appreciated from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings of which:

[0010]FIG. 1 is a high level block diagram of the present invention;

[0011]FIG. 2 is a block diagram of a preferred embodiment of the present invention;

[0012]FIG. 3 is an example of a meta-information table for use with the preferred embodiment of the present invention;

[0013]FIG. 4 is another example of a meta-information table for use with the preferred embodiment of the present invention;

[0014]FIG. 5 is yet another example of a meta-information table for use with the preferred embodiment of the present invention;

[0015]FIG. 6 is an example of a collected documents table for use with the preferred embodiment of the present invention;

[0016]FIG. 7 is an example of an information collection apparatus table for use with the preferred embodiment of the present invention;

[0017]FIG. 8 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention; and

[0018]FIG. 9 is a flowchart of another information collection process in accordance with a preferred embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0019] Reference will now be made in detail to the present preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.

[0020]FIG. 1 is a high level block diagram of the present invention. The present invention generally comprises an information presentation apparatus 1 connected to an information collection apparatus 2 and a client application 3 via a network 4.

[0021] The information presentation apparatus 1 generally comprises: a meta-information management unit 1 a, a meta-information table 1 b which stores update information pertaining to documents managed by the information presentation apparatus 1; a document storage unit 1 d that stores the documents; and an information collection table 1 c. The meta-information management unit 1 a manages document update information for documents stored in the document storage unit 1 d and references the meta-information table 1 b when an information collection request is made by the information collection apparatus 2.

[0022] Update information pertaining to documents stored in the meta-information table 1 b may include: the time of document updating, version information, and serial numbers indicating an update sequence. In general, the meta-information table 1 b indicates which documents have been modified and at which time the documents were so modified. The term “modify” refers to the creation, updating or deletion of a document. The meta-information management unit 1 a generates and transmits a stored document list (including update information or, more simply, a list of documents updated since the previous request) in response to an information collection request from the information collection apparatus 2.

[0023] The information collection table 1 c registers the name of each information collection apparatus 2 which issues information collection requests. Using the names of information collection apparatus 2 registered in the information collection table 1 c, the meta-information management unit 1 a may request each information collection apparatus 2 to issue an information collection request when there is any change in meta-information table 1 b. In other words, the information collection table 1 c allows information collection to be carried more efficiently by allowing the meta-information management unit 1 a to request registered information collection apparatus 2 to collect information when there is any change in the meta-information table 1 b.

[0024] In the present invention, the update information in the meta-information table 1 b allows information to be collected efficiently by the information collection apparatus 2. The information collection apparatus 2 no longer carries out an information collection process for information that has not been updated (i.e., information and documents already collected by the information collection apparatus 2) thereby relieving a large burden on the information presentation apparatus 1. Additionally, because the effectiveness of information collection is raised, a search engine based on the information collection apparatus 2 can also provide newer information to the user.

[0025]FIG. 2 is a block diagram of a preferred embodiment of the present invention. The preferred system generally comprises an information presentation apparatus 11 (such as a WWW server), an information collection apparatus 12 (such as a web robot) and a client application 13 (such as a “web browser”). The information presentation apparatus 11, the information collection apparatus 12, and the client application 13 are connected via a network 14 (such as the Internet or an Intranet).

[0026] The information presentation apparatus 11 generally comprises a document storage unit 21 (such as a hard disk, magneto-optical disk or CD-ROM), a meta-information management unit 22 (typically embodied in software), a meta-information storage unit 23 (such as a hard disk, magneto-optical disk or CD-ROM) having a meta-information table 23 a (typically formed in software), an information collection apparatus storage unit 24 (such as a hard disk, magneto-optical disk or CD-ROM) having an information collection table 24 a (typically formed in software), and a data transmission/reception unit 25 (typically comprising a network adaptor, modem and/or associated software).

[0027] During normal operation, periodically, or when a document has been updated, the meta-information management unit 22 accesses a document saved in the document storage unit 21 and saves the document name along with document update information in the meta-information table 23 a of the meta-information storage unit 23.

[0028]FIG. 3, FIG. 4, and FIG. 5 are examples of the meta-information table 23 a for use with the preferred embodiment of the present invention.

[0029]FIG. 3 shows an example in which the meta-information table 23 a is generated periodically and in which the update information comprises a date and time. The update date/time of each document 1, 2, . . . is checked at each check date, and the update date/time of each document is written into the meta-information table 23 a.

[0030]FIG. 4 shows an example in which the meta-information table 23 a is generated periodically and in which update information comprises version information pertaining to individual documents. The version information pertaining to each document 1, 2, . . . is checked at each check date, and written into the meta-information table 23 a.

[0031]FIG. 5 shows an example in which the meta-information table 23 a is modified when a document has been updated. File names associated with the serial numbers and a differentiation (for example: updated/deleted/new) are written into the meta-information table 23 a in order of update date/time sequence (as indicated by the serial number). Thus, in effect the meta-information table 23 a forms a modification history of the documents on said document storage unit 21.

[0032] Referring once again to FIG. 2, the data transmission/reception units 25 process of transmission and reception of data with the information collection apparatus 12 and the client application 13 will be explained. When the client application 13 makes an information acquisition request (such as for a web page) to the information presentation apparatus 11, the data transmission/reception unit 25 retrieves the relevant information from document storage unit 21 and sends the retrieved information to the client application 13.

[0033] On the other hand, when the information collection apparatus 12 makes a collection request to the information presentation apparatus 11, the time (if the meta-information table 23 a as shown in FIG. 3 is used) when the previous collection was carried out is also specified. The data transmission/reception unit 25 receives and formats the request to the meta-information management unit 22. The meta-information management unit 22 searches the meta-information table 23 a for documents modified since the specified time and generates a collected documents table. The data transmission/reception unit 25 returns the created collected documents table to the information collection apparatus 12.

[0034]FIG. 6 is an example of a collected documents table for use with the preferred embodiment of the present invention. The collected documents table registers updated documents, deleted documents, and newly created documents.

[0035]FIG. 7 is an example of an optional information collection apparatus table 24 a for use with the preferred embodiment of the present invention. The name of information collection apparatuses 12 may be registered in the information collection table 24 a of the information collection apparatus storage unit 24. The information presentation apparatus 11 requests registered information collection apparatus 12 to issue a collection request to collect information when the meta-information table 23 a has been modified. This can further decrease network traffic in that Web robots only accesses WWW servers when informed that new information is present.

[0036]FIG. 8 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention. This particular process utilizes the meta-information table 23 a shown in FIG. 3. The process starts in step S0. In step S1, the information collection apparatus 12 makes a collection request to the information presentation apparatus 11, specifying a time T when a previous collection was made. This request is issued periodically at set intervals or at the request of the information presentation apparatus 11, as discussed above. When the data transmission/reception unit 25 receives the request, it transmits the request to the meta-information management unit 22.

[0037] In step S2 the meta-information management unit 22 checks whether information received prior to the time T remains in meta-information table 23 a (see FIG. 3). If such information does not remain, the process goes to step S3 and the meta-information management unit 22 acquires all documents on behalf of information collection apparatus 12.

[0038] If information received prior to the time T remains in the meta-information table 23 a, the process goes to Step S4 and “I” is set to equal 1. Next, in Step S5, the collected documents table (shown in FIG. 6) is generated.

[0039] Thereafter, in Step S6, assuming a number of documents N, a check is made as to whether I≦N (i.e., to determine if all documents have been processed). If I≦N, the process goes to Step S7 and after the meta-information table 23 a is referenced, a check is made as to whether document I has been modified since time T (Step S7). If document I has not been modified since time T, the process goes to Step S9 and I is incremented to I+1. Thereafter, the process repeats from Step S6.

[0040] If, in Step S7, document I has been modified since time T, the process goes to Step S8 and a reference to document I is added to the collected documents table. Thereafter, the process goes to Step S9 (I=I+1) and returns to Step S6.

[0041] When I becomes greater than N in Step S6, (all documents have been processed), the process goes to Step S10 and the data transmission/reception unit 25 returns the completed collected documents table to information collection apparatus 12. The process ends in Step S11. Thereafter, based on the collected documents table sent from the information presentation apparatus 11, the information collection apparatus 12 acquires only the required updated or new documents.

[0042] The flowchart in FIG. 8 pertains to a situation in which the meta-information table 23 a as shown in FIG. 3 was used. However, a collected documents table can also be generated using a similar process when version information of individual documents, as shown in the meta-information table 23 a in FIG. 4 is used.

[0043]FIG. 9 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention. This particular process utilizes the meta-information table 23 a shown in FIG. 5. The process starts in Step S100. In Step S101, the information collection apparatus 12 makes a collection request, to the information presentation apparatus 11, specifying a serial number A for the previous collection. When the data transmission/reception unit 25 receives the request, it transmits the request to the meta-information management unit 22.

[0044] In Step S102, the meta-information management unit 22 checks whether information received prior to the modification indicated by serial number A remains in the meta-information table 23 a (see FIG. 5). If such information does not remain, the process goes to Step S103 and the meta-information management unit 22 responds by acquiring all documents on behalf of the information collection apparatus 12.

[0045] If, in Step S102, information received prior to the modification indicated by serial number A remains in the meta-information table 23 a the process goes to Step S104 and “I” is set to equal 1. Thereafter, in Step S105, the collected documents table (shown in FIG. 6) is generated.

[0046] In Step S106, assuming a number of documents N, a check is made as to whether I≦N, and if I≦N the process goes to Step S107. In step S107, the meta-information table 23 a is referenced and a check is made as to whether a document I has been modified since the modification indicated by serial number A. If document I has not been modified since modification indicated by serial number A, the process goes to Step S109 and I is incremented to I+1. The process then returns to Step S106. If, in step S107, document I has been modified since the modification indicated by serial number A, the process goes to Step S108 and document I is added to the collected documents table. Thereafter, the process goes to Step S109 (I=I+1) and the process returns to Step S106.

[0047] When in Step S106, I becomes greater than N, the process goes to Step S110 and the data transmission/reception unit 25 returns the generated collected documents table to the information collection apparatus 12. The process ends in Step S111. Thereafter, based on the collected documents table sent from the information presentation apparatus 11, the information collection apparatus 12 acquires only the documents which are new or updated.

[0048] When, in either foregoing processes (FIG. 8 or FIG. 9), the collected documents table is returned, the collected documents table may be compressed, thereby decreasing transmission time to the information collection apparatus 12. In addition, when the collected documents table is returned to the information collection apparatus 12, the up-to-date documents stored in the document storage unit 21 may be sent to the information collection apparatus 12 together with the collected documents table.

[0049] Although a preferred embodiments of the present invention have been shown and described along with some possible variations, it will be appreciated by those skilled in the art that further changes and variations may be used without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

[0050] Generally, in the present invention, a meta-information management function is provided in an information presentation apparatus. The meta-information enables an information collection apparatus to carry out an information collection process efficiently by indicating which documents have been modified since a previous request. The burden on the information presentation apparatus as well as on a network is alleviated. Consequently, the number of information updates can be increased, allowing searches to be carried out with newer data. 

What is claimed is:
 1. A web page server comprising: a document storage unit that stores web pages; a meta-information table that stores information including a status of each web page stored in the document storage unit; an information collection table that stores names of web robots that monitor the content of the web page server; and management software that provides the following functions: monitoring modification of the web pages in the document storage unit; updating the meta-information table based on the monitoring function; creating, in response to a collection request by the web robots, a list of web pages in the document storage that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server; and retrieving the names of web robots from the information collection table when the monitoring function detects modification of web pages and transmitting a message to each web robot indicating that the web robot should issue a collection request.
 2. A web page server, as set forth in claim 1, wherein the monitoring function stores a date and time of modification in the meta-information table when a modification of a web page is detected.
 3. A web page server, as set forth in claim 1, wherein the monitoring function stores a current version of each web page in the meta-information table.
 4. A web page server, as set forth in claim 1, wherein the monitoring function stores a modification history of the web pages.
 5. A web page server, as set forth in claim 4, wherein each entry in the modification history is provided with a serial number indicating an order in the modification history.
 6. A web page server, as set forth in claim 1, wherein the list includes names of modified web pages and an indication of the status of each modified web page.
 7. A web page server, as set forth in claim 1, wherein the monitoring function monitors for the update, creation or deletion of web pages.
 8. An information system comprising: a client that requests information including web pages; an information collection apparatus that uses a web robot to retrieve web pages on a web page server, indexes the retrieved web pages, and provides a search facility for the client; and a web page server comprising: a document storage unit that stores web pages for access by the client and the web robot; a meta-information table that stores information including a status of each web page stored in the document storage unit; and managing software that provides the following functions: monitoring modification of the web pages in the document storage unit; updating the meta-information table based on the monitoring function; and creating, in response to a collection request from the web robot, a list of web pages in the document storage that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server.
 9. A method of operating a web page server, comprising: storing web pages; storing status information of each stored web page; storing names of web robots that monitor the content of the web page server; monitoring modification of the stored web pages; updating the status information based on the monitoring of the modification of the stored web pages; creating, in response to a collection request by one of the web robots, a list of web pages in the document storage that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server; and retrieving the names of web robots when the monitoring function detects modification of web pages and transmitting a message to each web robot indicating that the web robot should issue a collection request.
 10. The method of claim 9, further comprising: storing a date and time of modification when a modification of a web page is detected.
 11. The method of claim 9, further comprising: storing a current version of each web page in the web page server.
 12. The method of claim 9, further comprising: storing a modification history of the web pages.
 13. The method of claim 9, further comprising: providing a serial number indicating an order in the modification history.
 14. The method of claim 9, further comprising: including names of modified web pages and an indication of the status of each modified web page in the created list.
 15. The method of claim 9, further comprising: monitoring update, creation or deletion of web pages.
 16. A method of collecting information for a client from a web page server using a web robot, the method comprising: storing web pages in the web page server for access by the client and the web robot; storing information including a status of each web page stored in the web page server; collecting information using a web robot to retrieve web pages on the web page server; indexing the retrieved web pages to provide a search facility for the client; providing an information request from the client to the web robot, the information request including a request for web pages; monitoring modification of the web pages stored in the web page server; maintaining modification information based on the monitoring of the modification of the stored web pages; and creating, in response to a collection request from the web robot, a list of web pages stored in the web page server that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server. 